62 research outputs found
Does Size Matter – How Much Data is Required to Train a REG Algorithm?
In this paper we investigate how much data is required to train an algorithm for attribute selection, a subtask of Referring Expressions Generation (REG). To enable comparison between different-sized training sets, a systematic training method was developed. The results show that depending on the complexity of the domain, training on 10 to 20 items may already lead to a good performance
NeuralREG: An end-to-end approach to referring expression generation
Traditionally, Referring Expression Generation (REG) models first decide on
the form and then on the content of references to discourse entities in text,
typically relying on features such as salience and grammatical function. In
this paper, we present a new approach (NeuralREG), relying on deep neural
networks, which makes decisions about form and content in one go without
explicit feature extraction. Using a delexicalized version of the WebNLG
corpus, we show that the neural model substantially improves over two strong
baselines. Data and models are publicly available.Comment: Accepted for presentation at ACL 201
Data-driven sentence simplification: Survey and benchmark
Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments
Individual Variation in the Choice of Referential Form
This study aims to measure the variation between writers in their choices of referential form by collecting and analysing a new and publicly available corpus of referring expressions. The corpus is composed of referring expressions produced by different participants in identical situations. Results, measured in terms of normalized entropy, reveal substantial individual variation. We discuss the problems and prospects of this finding for automatic text generation applications
- …